Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells427436
Missing cells (%)8.0%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric54
Categorical45
Text33

Alerts

Dataset ADataset B
Age has 89 (20.0%) missing values Age has 93 (20.9%) missing values Missing
Cabin has 337 (75.6%) missing values Cabin has 343 (76.9%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 296 (66.4%) zeros SibSp has 310 (69.5%) zeros Zeros
Parch has 343 (76.9%) zeros Alert not present in this datasetZeros
Fare has 7 (1.6%) zeros Fare has 5 (1.1%) zeros Zeros
Alert not present in this datasetSex is highly overall correlated with SurvivedHigh correlation
Alert not present in this datasetSurvived is highly overall correlated with SexHigh correlation
Alert not present in this datasetParch is highly imbalanced (51.8%) Imbalance

Reproduction

 Dataset ADataset B
Analysis started2025-03-21 10:30:33.7707212025-03-21 10:30:35.912971
Analysis finished2025-03-21 10:30:35.9101622025-03-21 10:30:37.483535
Duration2.14 seconds1.57 second
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean448.47982454.15919
 Dataset ADataset B
Minimum11
Maximum891891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:30:37.581493image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile44.538.75
Q1213232.25
median445455.5
Q3689689.5
95-th percentile859.5852
Maximum891891
Range890890
Interquartile range (IQR)476457.25

Descriptive statistics

 Dataset ADataset B
Standard deviation265.00801258.87042
Coefficient of variation (CV)0.590902850.5699993
Kurtosis-1.2651205-1.2221746
Mean448.47982454.15919
Median Absolute Deviation (MAD)236.5229
Skewness-0.0033682188-0.027847904
Sum200022202555
Variance70229.24367013.896
MonotonicityNot monotonicNot monotonic
2025-03-21T10:30:37.718588image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
293 1
 
0.2%
361 1
 
0.2%
278 1
 
0.2%
776 1
 
0.2%
674 1
 
0.2%
627 1
 
0.2%
868 1
 
0.2%
125 1
 
0.2%
682 1
 
0.2%
8 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
397 1
 
0.2%
279 1
 
0.2%
159 1
 
0.2%
555 1
 
0.2%
32 1
 
0.2%
414 1
 
0.2%
491 1
 
0.2%
589 1
 
0.2%
540 1
 
0.2%
542 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
5 1
0.2%
6 1
0.2%
8 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
5 1
0.2%
8 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
16 1
0.2%
18 1
0.2%
21 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
5 1
0.2%
8 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
16 1
0.2%
18 1
0.2%
21 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
5 1
0.2%
6 1
0.2%
8 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
276 
1
170 
0
281 
1
165 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row00
2nd row00
3rd row01
4th row11
5th row00

Common Values

ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%
ValueCountFrequency (%)
0 281
63.0%
1 165
37.0%

Length

2025-03-21T10:30:37.816378image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-21T10:30:37.862207image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:37.894183image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%
ValueCountFrequency (%)
0 281
63.0%
1 165
37.0%

Most occurring characters

ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%
ValueCountFrequency (%)
0 281
63.0%
1 165
37.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%
ValueCountFrequency (%)
0 281
63.0%
1 165
37.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%
ValueCountFrequency (%)
0 281
63.0%
1 165
37.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%
ValueCountFrequency (%)
0 281
63.0%
1 165
37.0%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
254 
1
104 
2
88 
3
237 
1
111 
2
98 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row23
3rd row33
4th row21
5th row22

Common Values

ValueCountFrequency (%)
3 254
57.0%
1 104
23.3%
2 88
 
19.7%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Length

2025-03-21T10:30:37.947788image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-21T10:30:37.994520image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:38.034612image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3 254
57.0%
1 104
23.3%
2 88
 
19.7%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Most occurring characters

ValueCountFrequency (%)
3 254
57.0%
1 104
23.3%
2 88
 
19.7%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 254
57.0%
1 104
23.3%
2 88
 
19.7%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 254
57.0%
1 104
23.3%
2 88
 
19.7%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 254
57.0%
1 104
23.3%
2 88
 
19.7%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:30:38.364157image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length5767
Median length4850
Mean length26.78475326.701794
Min length1312

Characters and Unicode

 Dataset ADataset B
Total characters1194611909
Distinct characters6059
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowSkoog, Mr. WilhelmRice, Master. Eric
2nd rowParkes, Mr. Francis "Frank"Smiljanic, Mr. Mile
3rd rowMyhrman, Mr. Pehr Fabian Oliver MalkolmOhman, Miss. Velin
4th rowWilhelms, Mr. CharlesSpencer, Mrs. William Augustus (Marie Eugenie)
5th rowKirkland, Rev. Charles LeonardCunningham, Mr. Alfred Fleming
ValueCountFrequency (%)
mr 258
 
14.3%
miss 96
 
5.3%
mrs 62
 
3.4%
william 30
 
1.7%
master 20
 
1.1%
george 15
 
0.8%
henry 15
 
0.8%
john 13
 
0.7%
james 12
 
0.7%
anna 11
 
0.6%
Other values (892) 1271
70.5%
ValueCountFrequency (%)
mr 260
 
14.5%
miss 95
 
5.3%
mrs 61
 
3.4%
william 33
 
1.8%
john 23
 
1.3%
henry 22
 
1.2%
master 19
 
1.1%
charles 15
 
0.8%
thomas 15
 
0.8%
mary 13
 
0.7%
Other values (876) 1242
69.1%
2025-03-21T10:30:38.848757image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1359
 
11.4%
r 968
 
8.1%
e 853
 
7.1%
a 825
 
6.9%
s 656
 
5.5%
i 654
 
5.5%
n 647
 
5.4%
M 554
 
4.6%
l 526
 
4.4%
o 518
 
4.3%
Other values (50) 4386
36.7%
ValueCountFrequency (%)
1353
 
11.4%
r 962
 
8.1%
e 838
 
7.0%
a 818
 
6.9%
n 666
 
5.6%
i 654
 
5.5%
s 636
 
5.3%
M 544
 
4.6%
l 532
 
4.5%
o 503
 
4.2%
Other values (49) 4403
37.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11946
100.0%
ValueCountFrequency (%)
(unknown) 11909
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1359
 
11.4%
r 968
 
8.1%
e 853
 
7.1%
a 825
 
6.9%
s 656
 
5.5%
i 654
 
5.5%
n 647
 
5.4%
M 554
 
4.6%
l 526
 
4.4%
o 518
 
4.3%
Other values (50) 4386
36.7%
ValueCountFrequency (%)
1353
 
11.4%
r 962
 
8.1%
e 838
 
7.0%
a 818
 
6.9%
n 666
 
5.6%
i 654
 
5.5%
s 636
 
5.3%
M 544
 
4.6%
l 532
 
4.5%
o 503
 
4.2%
Other values (49) 4403
37.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11946
100.0%
ValueCountFrequency (%)
(unknown) 11909
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1359
 
11.4%
r 968
 
8.1%
e 853
 
7.1%
a 825
 
6.9%
s 656
 
5.5%
i 654
 
5.5%
n 647
 
5.4%
M 554
 
4.6%
l 526
 
4.4%
o 518
 
4.3%
Other values (50) 4386
36.7%
ValueCountFrequency (%)
1353
 
11.4%
r 962
 
8.1%
e 838
 
7.0%
a 818
 
6.9%
n 666
 
5.6%
i 654
 
5.5%
s 636
 
5.3%
M 544
 
4.6%
l 532
 
4.5%
o 503
 
4.2%
Other values (49) 4403
37.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11946
100.0%
ValueCountFrequency (%)
(unknown) 11909
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1359
 
11.4%
r 968
 
8.1%
e 853
 
7.1%
a 825
 
6.9%
s 656
 
5.5%
i 654
 
5.5%
n 647
 
5.4%
M 554
 
4.6%
l 526
 
4.4%
o 518
 
4.3%
Other values (50) 4386
36.7%
ValueCountFrequency (%)
1353
 
11.4%
r 962
 
8.1%
e 838
 
7.0%
a 818
 
6.9%
n 666
 
5.6%
i 654
 
5.5%
s 636
 
5.3%
M 544
 
4.6%
l 532
 
4.5%
o 503
 
4.2%
Other values (49) 4403
37.0%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
285 
female
161 
male
288 
female
158 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.72197314.7085202
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21062100
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalemale
3rd rowmalefemale
4th rowmalefemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 288
64.6%
female 158
35.4%

Length

2025-03-21T10:30:38.937625image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-21T10:30:38.989164image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:39.022603image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 288
64.6%
female 158
35.4%

Most occurring characters

ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2100
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2100
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2100
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7473
Distinct (%)20.7%20.7%
Missing8993
Missing (%)20.0%20.9%
Infinite00
Infinite (%)0.0%0.0%
Mean30.01798330.095382
 Dataset ADataset B
Minimum0.420.42
Maximum8070
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:30:39.117272image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile46.6
Q12121
median2928
Q33938
95-th percentile5456
Maximum8070
Range79.5869.58
Interquartile range (IQR)1817

Descriptive statistics

 Dataset ADataset B
Standard deviation14.34846713.849738
Coefficient of variation (CV)0.477995710.46019478
Kurtosis0.22239404-0.075073147
Mean30.01798330.095382
Median Absolute Deviation (MAD)88
Skewness0.325658270.29994218
Sum10716.4210623.67
Variance205.87851191.81524
MonotonicityNot monotonicNot monotonic
2025-03-21T10:30:39.256629image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25 15
 
3.4%
22 14
 
3.1%
30 13
 
2.9%
27 13
 
2.9%
31 12
 
2.7%
29 12
 
2.7%
21 11
 
2.5%
19 11
 
2.5%
32 11
 
2.5%
36 11
 
2.5%
Other values (64) 234
52.5%
(Missing) 89
 
20.0%
ValueCountFrequency (%)
24 19
 
4.3%
30 16
 
3.6%
22 15
 
3.4%
25 14
 
3.1%
21 13
 
2.9%
28 13
 
2.9%
35 12
 
2.7%
19 11
 
2.5%
29 10
 
2.2%
20 10
 
2.2%
Other values (63) 220
49.3%
(Missing) 93
20.9%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.83 1
 
0.2%
1 2
 
0.4%
2 5
1.1%
3 3
0.7%
4 7
1.6%
5 2
 
0.4%
6 2
 
0.4%
7 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 2
0.4%
2 4
0.9%
3 2
0.4%
4 4
0.9%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 2
0.4%
2 4
0.9%
3 2
0.4%
4 4
0.9%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.83 1
 
0.2%
1 2
 
0.4%
2 5
1.1%
3 3
0.7%
4 7
1.6%
5 2
 
0.4%
6 2
 
0.4%
7 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.54708520.58744395
 Dataset ADataset B
Minimum00
Maximum88
Zeros296310
Zeros (%)66.4%69.5%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:30:39.349038image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile2.753
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.11829651.324013
Coefficient of variation (CV)2.04409932.2538543
Kurtosis18.37826215.066292
Mean0.54708520.58744395
Median Absolute Deviation (MAD)00
Skewness3.72137713.5973032
Sum244262
Variance1.2505871.7530105
MonotonicityNot monotonicNot monotonic
2025-03-21T10:30:39.545798image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 296
66.4%
1 111
 
24.9%
2 16
 
3.6%
3 9
 
2.0%
4 8
 
1.8%
8 4
 
0.9%
5 2
 
0.4%
ValueCountFrequency (%)
0 310
69.5%
1 93
 
20.9%
2 13
 
2.9%
4 10
 
2.2%
3 9
 
2.0%
8 7
 
1.6%
5 4
 
0.9%
ValueCountFrequency (%)
0 296
66.4%
1 111
 
24.9%
2 16
 
3.6%
3 9
 
2.0%
4 8
 
1.8%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 310
69.5%
1 93
 
20.9%
2 13
 
2.9%
3 9
 
2.0%
4 10
 
2.2%
5 4
 
0.9%
8 7
 
1.6%
ValueCountFrequency (%)
0 310
69.5%
1 93
 
20.9%
2 13
 
2.9%
3 9
 
2.0%
4 10
 
2.2%
5 4
 
0.9%
8 7
 
1.6%
ValueCountFrequency (%)
0 296
66.4%
1 111
 
24.9%
2 16
 
3.6%
3 9
 
2.0%
4 8
 
1.8%
5 2
 
0.4%
8 4
 
0.9%

Parch
Numeric

 Dataset ADataset B
Distinct75
Distinct (%)1.6%1.1%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
343 
1
58 
2
35 
5
 
4
4
 
3
Other values (2)
 
3
0
338 
1
60 
2
41 
3
 
4
5
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters446
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique10 ?
Unique (%)0.2%0.0%

Sample

1st row1
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 343
76.9%
1 58
 
13.0%
2 35
 
7.8%
5 4
 
0.9%
4 3
 
0.7%
3 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 41
 
9.2%
3 4
 
0.9%
5 3
 
0.7%

Length

2025-03-21T10:30:39.614098image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-21T10:30:39.669313image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:39.727174image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 41
 
9.2%
3 4
 
0.9%
5 3
 
0.7%

Most occurring characters

ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 41
 
9.2%
3 4
 
0.9%
5 3
 
0.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 41
 
9.2%
3 4
 
0.9%
5 3
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 41
 
9.2%
3 4
 
0.9%
5 3
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 41
 
9.2%
3 4
 
0.9%
5 3
 
0.7%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct378371
Distinct (%)84.8%83.2%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:30:40.137239image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.64125566.7533632
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters29623012
Distinct characters3231
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique331316 ?
Unique (%)74.2%70.9%

Sample

 Dataset ADataset B
1st row347088382652
2nd row239853315037
3rd row347078347085
4th row244270PC 17569
5th row219533239853
ValueCountFrequency (%)
pc 26
 
4.7%
c.a 12
 
2.2%
ca 8
 
1.5%
a/5 7
 
1.3%
w./c 5
 
0.9%
347082 5
 
0.9%
sc/paris 4
 
0.7%
soton/o.q 4
 
0.7%
ston/o 4
 
0.7%
2 4
 
0.7%
Other values (395) 472
85.7%
ValueCountFrequency (%)
pc 34
 
6.0%
ca 12
 
2.1%
c.a 12
 
2.1%
a/5 10
 
1.8%
2343 7
 
1.2%
2 6
 
1.1%
w./c 6
 
1.1%
ston/o 6
 
1.1%
sc/paris 5
 
0.9%
soton/oq 4
 
0.7%
Other values (387) 469
82.1%
2025-03-21T10:30:40.655396image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 385
13.0%
1 339
11.4%
2 302
10.2%
7 246
8.3%
4 237
8.0%
6 209
 
7.1%
0 204
 
6.9%
5 193
 
6.5%
9 160
 
5.4%
8 142
 
4.8%
Other values (22) 545
18.4%
ValueCountFrequency (%)
3 367
12.2%
1 337
11.2%
2 300
10.0%
7 243
8.1%
4 233
 
7.7%
6 225
 
7.5%
5 207
 
6.9%
0 196
 
6.5%
9 175
 
5.8%
8 128
 
4.2%
Other values (21) 601
20.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2962
100.0%
ValueCountFrequency (%)
(unknown) 3012
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 385
13.0%
1 339
11.4%
2 302
10.2%
7 246
8.3%
4 237
8.0%
6 209
 
7.1%
0 204
 
6.9%
5 193
 
6.5%
9 160
 
5.4%
8 142
 
4.8%
Other values (22) 545
18.4%
ValueCountFrequency (%)
3 367
12.2%
1 337
11.2%
2 300
10.0%
7 243
8.1%
4 233
 
7.7%
6 225
 
7.5%
5 207
 
6.9%
0 196
 
6.5%
9 175
 
5.8%
8 128
 
4.2%
Other values (21) 601
20.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2962
100.0%
ValueCountFrequency (%)
(unknown) 3012
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 385
13.0%
1 339
11.4%
2 302
10.2%
7 246
8.3%
4 237
8.0%
6 209
 
7.1%
0 204
 
6.9%
5 193
 
6.5%
9 160
 
5.4%
8 142
 
4.8%
Other values (22) 545
18.4%
ValueCountFrequency (%)
3 367
12.2%
1 337
11.2%
2 300
10.0%
7 243
8.1%
4 233
 
7.7%
6 225
 
7.5%
5 207
 
6.9%
0 196
 
6.5%
9 175
 
5.8%
8 128
 
4.2%
Other values (21) 601
20.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2962
100.0%
ValueCountFrequency (%)
(unknown) 3012
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 385
13.0%
1 339
11.4%
2 302
10.2%
7 246
8.3%
4 237
8.0%
6 209
 
7.1%
0 204
 
6.9%
5 193
 
6.5%
9 160
 
5.4%
8 142
 
4.8%
Other values (22) 545
18.4%
ValueCountFrequency (%)
3 367
12.2%
1 337
11.2%
2 300
10.0%
7 243
8.1%
4 233
 
7.7%
6 225
 
7.5%
5 207
 
6.9%
0 196
 
6.5%
9 175
 
5.8%
8 128
 
4.2%
Other values (21) 601
20.0%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct176172
Distinct (%)39.5%38.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean30.40591434.22838
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros75
Zeros (%)1.6%1.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:30:40.775893image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.225
Q17.9257.8958
median14.514.5
Q330.532.596875
95-th percentile92.8948110.8833
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)22.57524.701075

Descriptive statistics

 Dataset ADataset B
Standard deviation43.88634153.895366
Coefficient of variation (CV)1.44334891.5745812
Kurtosis37.75753431.178494
Mean30.40591434.22838
Median Absolute Deviation (MAD)7.257.25
Skewness4.86926664.7270986
Sum13561.03815265.858
Variance1926.0112904.7105
MonotonicityNot monotonicNot monotonic
2025-03-21T10:30:40.912788image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 23
 
5.2%
13 21
 
4.7%
26 17
 
3.8%
7.8958 17
 
3.8%
7.75 14
 
3.1%
7.2292 12
 
2.7%
7.925 11
 
2.5%
7.8542 9
 
2.0%
26.55 8
 
1.8%
10.5 8
 
1.8%
Other values (166) 306
68.6%
ValueCountFrequency (%)
8.05 22
 
4.9%
13 22
 
4.9%
7.8958 20
 
4.5%
7.75 16
 
3.6%
26 16
 
3.6%
10.5 13
 
2.9%
7.2292 9
 
2.0%
26.55 9
 
2.0%
7.25 9
 
2.0%
7.775 8
 
1.8%
Other values (162) 302
67.7%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.4375 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
ValueCountFrequency (%)
0 5
1.1%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 4
0.9%
ValueCountFrequency (%)
0 5
1.1%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 4
0.9%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.4375 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct9187
Distinct (%)83.5%84.5%
Missing337343
Missing (%)75.6%76.9%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:30:41.290621image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.4128443.6990291
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters372381
Distinct characters1818
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7473 ?
Unique (%)67.9%70.9%

Sample

 Dataset ADataset B
1st rowA24B78
2nd rowD26B39
3rd rowD49C46
4th rowF E69B41
5th rowE101D56
ValueCountFrequency (%)
f 4
 
3.2%
e101 3
 
2.4%
d36 2
 
1.6%
g73 2
 
1.6%
b96 2
 
1.6%
b98 2
 
1.6%
c2 2
 
1.6%
e121 2
 
1.6%
c92 2
 
1.6%
f33 2
 
1.6%
Other values (92) 101
81.5%
ValueCountFrequency (%)
f33 3
 
2.5%
c23 3
 
2.5%
c25 3
 
2.5%
c27 3
 
2.5%
d17 2
 
1.6%
d20 2
 
1.6%
b35 2
 
1.6%
b57 2
 
1.6%
b59 2
 
1.6%
b63 2
 
1.6%
Other values (88) 98
80.3%
2025-03-21T10:30:41.726213image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
B 36
 
9.7%
1 34
 
9.1%
2 28
 
7.5%
3 27
 
7.3%
C 26
 
7.0%
6 25
 
6.7%
4 22
 
5.9%
D 21
 
5.6%
5 21
 
5.6%
9 20
 
5.4%
Other values (8) 112
30.1%
ValueCountFrequency (%)
C 39
10.2%
B 36
 
9.4%
3 35
 
9.2%
1 32
 
8.4%
2 31
 
8.1%
5 28
 
7.3%
6 23
 
6.0%
7 21
 
5.5%
D 20
 
5.2%
8 19
 
5.0%
Other values (8) 97
25.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 372
100.0%
ValueCountFrequency (%)
(unknown) 381
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
B 36
 
9.7%
1 34
 
9.1%
2 28
 
7.5%
3 27
 
7.3%
C 26
 
7.0%
6 25
 
6.7%
4 22
 
5.9%
D 21
 
5.6%
5 21
 
5.6%
9 20
 
5.4%
Other values (8) 112
30.1%
ValueCountFrequency (%)
C 39
10.2%
B 36
 
9.4%
3 35
 
9.2%
1 32
 
8.4%
2 31
 
8.1%
5 28
 
7.3%
6 23
 
6.0%
7 21
 
5.5%
D 20
 
5.2%
8 19
 
5.0%
Other values (8) 97
25.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 372
100.0%
ValueCountFrequency (%)
(unknown) 381
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
B 36
 
9.7%
1 34
 
9.1%
2 28
 
7.5%
3 27
 
7.3%
C 26
 
7.0%
6 25
 
6.7%
4 22
 
5.9%
D 21
 
5.6%
5 21
 
5.6%
9 20
 
5.4%
Other values (8) 112
30.1%
ValueCountFrequency (%)
C 39
10.2%
B 36
 
9.4%
3 35
 
9.2%
1 32
 
8.4%
2 31
 
8.1%
5 28
 
7.3%
6 23
 
6.0%
7 21
 
5.5%
D 20
 
5.2%
8 19
 
5.0%
Other values (8) 97
25.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 372
100.0%
ValueCountFrequency (%)
(unknown) 381
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
B 36
 
9.7%
1 34
 
9.1%
2 28
 
7.5%
3 27
 
7.3%
C 26
 
7.0%
6 25
 
6.7%
4 22
 
5.9%
D 21
 
5.6%
5 21
 
5.6%
9 20
 
5.4%
Other values (8) 112
30.1%
ValueCountFrequency (%)
C 39
10.2%
B 36
 
9.4%
3 35
 
9.2%
1 32
 
8.4%
2 31
 
8.1%
5 28
 
7.3%
6 23
 
6.0%
7 21
 
5.5%
D 20
 
5.2%
8 19
 
5.0%
Other values (8) 97
25.5%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing10
Missing (%)0.2%0.0%
Memory size7.0 KiB7.0 KiB
S
327 
C
80 
Q
38 
S
312 
C
95 
Q
39 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSQ
2nd rowSS
3rd rowSS
4th rowSC
5th rowQS

Common Values

ValueCountFrequency (%)
S 327
73.3%
C 80
 
17.9%
Q 38
 
8.5%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 312
70.0%
C 95
 
21.3%
Q 39
 
8.7%

Length

2025-03-21T10:30:41.804501image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-21T10:30:41.851393image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:41.893795image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s 327
73.5%
c 80
 
18.0%
q 38
 
8.5%
ValueCountFrequency (%)
s 312
70.0%
c 95
 
21.3%
q 39
 
8.7%

Most occurring characters

ValueCountFrequency (%)
S 327
73.5%
C 80
 
18.0%
Q 38
 
8.5%
ValueCountFrequency (%)
S 312
70.0%
C 95
 
21.3%
Q 39
 
8.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 327
73.5%
C 80
 
18.0%
Q 38
 
8.5%
ValueCountFrequency (%)
S 312
70.0%
C 95
 
21.3%
Q 39
 
8.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 327
73.5%
C 80
 
18.0%
Q 38
 
8.5%
ValueCountFrequency (%)
S 312
70.0%
C 95
 
21.3%
Q 39
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 327
73.5%
C 80
 
18.0%
Q 38
 
8.5%
ValueCountFrequency (%)
S 312
70.0%
C 95
 
21.3%
Q 39
 
8.7%

Interactions

Dataset A

2025-03-21T10:30:35.373287image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.889421image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.027314image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.161383image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.327425image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.394199image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.647110image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.637662image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:35.071263image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B


Interaction plot not present for dataset

Dataset A

2025-03-21T10:30:35.430813image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.948099image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.087153image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.215050image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.390142image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.453902image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.710653image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.700441image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:35.129123image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B


Interaction plot not present for dataset

Dataset A

2025-03-21T10:30:35.491636image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:37.008673image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.149179image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.275198image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.457186image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.515671image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.772145image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.760162image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:35.192717image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B


Interaction plot not present for dataset

Dataset A

2025-03-21T10:30:35.554816image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B


Interaction plot not present for dataset

Dataset A

2025-03-21T10:30:34.211004image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B


Interaction plot not present for dataset

Dataset A

2025-03-21T10:30:34.520058image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B


Interaction plot not present for dataset

Dataset A

2025-03-21T10:30:34.841018image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B


Interaction plot not present for dataset

Dataset A

2025-03-21T10:30:35.256278image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B


Interaction plot not present for dataset

Dataset A

2025-03-21T10:30:35.612603image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:37.073157image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.270080image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.337260image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.585660image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.576222image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:34.903805image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:36.826365image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:30:35.315448image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B


Interaction plot not present for dataset

Correlations

Dataset A

2025-03-21T10:30:41.941088image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:30:42.041237image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0300.168-0.2390.0280.2510.084-0.1560.193
Embarked0.0301.0000.1430.0000.0000.2260.0480.0000.123
Fare0.1680.1431.0000.3950.0080.4560.2030.4190.285
Parch-0.2390.0000.3951.000-0.0120.0000.2700.4400.131
PassengerId0.0280.0000.008-0.0121.0000.0000.000-0.0670.037
Pclass0.2510.2260.4560.0000.0001.0000.0770.1220.362
Sex0.0840.0480.2030.2700.0000.0771.0000.1930.480
SibSp-0.1560.0000.4190.440-0.0670.1220.1931.0000.188
Survived0.1930.1230.2850.1310.0370.3620.4800.1881.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0550.1350.324-0.0120.2990.018-0.1940.102
Embarked0.0551.0000.2100.0120.0500.2970.0870.1140.192
Fare0.1350.2101.0000.162-0.0190.4800.1990.4620.274
Parch0.3240.0120.1621.0000.0560.0000.2230.3410.140
PassengerId-0.0120.050-0.0190.0561.0000.0000.089-0.0620.122
Pclass0.2990.2970.4800.0000.0001.0000.1840.1630.364
Sex0.0180.0870.1990.2230.0890.1841.0000.1570.582
SibSp-0.1940.1140.4620.341-0.0620.1630.1571.0000.181
Survived0.1020.1920.2740.1400.1220.3640.5820.1811.000

Missing values

Dataset A

2025-03-21T10:30:35.705436image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2025-03-21T10:30:37.292093image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2025-03-21T10:30:35.784648image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2025-03-21T10:30:37.369303image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2025-03-21T10:30:35.869018image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2025-03-21T10:30:37.447771image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
36036103Skoog, Mr. Wilhelmmale40.01434708827.9000NaNS
27727802Parkes, Mr. Francis "Frank"maleNaN002398530.0000NaNS
77577603Myhrman, Mr. Pehr Fabian Oliver Malkolmmale18.0003470787.7500NaNS
67367412Wilhelms, Mr. Charlesmale31.00024427013.0000NaNS
62662702Kirkland, Rev. Charles Leonardmale57.00021953312.3500NaNQ
86786801Roebling, Mr. Washington Augustus IImale31.000PC 1759050.4958A24S
12412501White, Mr. Percival Waylandmale54.0013528177.2875D26S
68168211Hassab, Mr. Hammadmale27.000PC 1757276.7292D49C
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
85485502Carter, Mrs. Ernest Courtenay (Lilian Hughes)female44.01024425226.0000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
27827903Rice, Master. Ericmale7.04138265229.1250NaNQ
15815903Smiljanic, Mr. MilemaleNaN003150378.6625NaNS
55455513Ohman, Miss. Velinfemale22.0003470857.7750NaNS
313211Spencer, Mrs. William Augustus (Marie Eugenie)femaleNaN10PC 17569146.5208B78C
41341402Cunningham, Mr. Alfred FlemingmaleNaN002398530.0000NaNS
49049103Hagland, Mr. Konrad Mathias ReiersenmaleNaN106530419.9667NaNS
58858903Gilinski, Mr. Eliezermale22.000149738.0500NaNS
53954011Frolicher, Miss. Hedwig Margarithafemale22.0021356849.5000B39C
54154203Andersson, Miss. Ingeborg Constanziafemale9.04234708231.2750NaNS
74174201Cavendish, Mr. Tyrell Williammale36.0101987778.8500C46S

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
72973003Ilmakangas, Miss. Pieta Sofiafemale25.0010STON/O2. 31012717.9250NaNS
20620703Backstrom, Mr. Karl Alfredmale32.0010310127815.8500NaNS
31831911Wick, Miss. Mary Nataliefemale31.000236928164.8667C7S
44044112Hart, Mrs. Benjamin (Esther Ada Bloomfield)female45.0011F.C.C. 1352926.2500NaNS
87187211Beckwith, Mrs. Richard Leonard (Sallie Monypeny)female47.00111175152.5542D35S
17417501Smith, Mr. James Clinchmale56.00001776430.6958A7C
75575612Hamalainen, Master. Viljomale0.671125064914.5000NaNS
64064103Jensen, Mr. Hans Pedermale20.00003500507.8542NaNS
60360403Torber, Mr. Ernst Williammale44.00003645118.0500NaNS
29229302Levy, Mr. Rene Jacquesmale36.0000SC/Paris 216312.8750DC

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
71471502Greenberg, Mr. Samuelmale52.00025064713.0000NaNS
74975003Connaghton, Mr. Michaelmale31.0003350977.7500NaNQ
83984011Marechal, Mr. PierremaleNaN001177429.7000C47C
50350403Laitinen, Miss. Kristina Sofiafemale37.00041359.5875NaNS
58458503Paulner, Mr. UschermaleNaN0034118.7125NaNC
77877903Kilgannon, Mr. Thomas JmaleNaN00368657.7375NaNQ
69369403Saad, Mr. Khalilmale25.00026727.2250NaNC
67267302Mitchell, Mr. Henry Michaelmale70.000C.A. 2458010.5000NaNS
45445503Peduzzi, Mr. JosephmaleNaN00A/5 28178.0500NaNS
39639703Olsson, Miss. Elinafemale31.0003504077.8542NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.